Skip to content

fix: wrap insertEvent with withRetry for concurrent PostToolUse hooks#243

Closed
james-gough-op wants to merge 14 commits intomksglu:nextfrom
james-gough-op:fix/posttooluse-concurrent-sqlite-contention
Closed

fix: wrap insertEvent with withRetry for concurrent PostToolUse hooks#243
james-gough-op wants to merge 14 commits intomksglu:nextfrom
james-gough-op:fix/posttooluse-concurrent-sqlite-contention

Conversation

@james-gough-op
Copy link
Copy Markdown

What / Why / How

What: Wrap SessionDB.insertEvent() with withRetry() and propagate busy_timeout pragma to Bun's SQLite adapter.

Why: When many tool calls complete in parallel (e.g., 16 batch mcp__linear__get_issue calls during a standup summary), Claude Code spawns multiple PostToolUse hooks simultaneously. These all open the same per-project SessionDB and compete for the SQLite write lock. While better-sqlite3's busy_timeout (30s) handles most contention, edge cases during transaction lock escalation can surface SQLITE_BUSY, causing Claude Code to report PostToolUse:mcp__linear__get_issue hook error to users.

Additionally, the BunSQLiteAdapter factory was silently dropping the timeout option passed from SQLiteBase, meaning Bun runtime had no busy_timeout at all — concurrent writes would fail immediately with SQLITE_BUSY.

How:

  1. src/session/db.ts — Changed transaction() to this.withRetry(() => transaction()) in insertEvent, leveraging the existing retry infrastructure in SQLiteBase
  2. src/db-base.ts — Added busy_timeout pragma in BunDatabaseFactory when opts.timeout is provided, matching better-sqlite3's automatic behavior
  3. Added a concurrent insert test verifying multiple DB instances can write to the same file without data loss

Affected platforms

  • Claude Code
  • All platforms

(The Bun busy_timeout fix applies to any platform using Bun runtime for the MCP server. The withRetry fix applies to all platforms since PostToolUse hooks run with Node.js.)

Test plan

  • Added Concurrent Insert Resilience test in tests/session/session-db.test.ts — opens 5 SessionDB instances against the same DB file and inserts events from each, verifying all 5 events are stored without errors
  • All 26 session-db tests pass
  • npm run typecheck passes clean

Checklist

  • Tests added/updated (TDD: red → green)
  • npm test passes (6 pre-existing failures unrelated to this change — security, cursor-hooks, vscode-hooks, session-hooks-smoke, hooks/integration)
  • npm run typecheck passes
  • Docs updated if needed (README, platform-support.md) — no docs changes needed
  • No Windows path regressions (forward slashes only)
  • Targets next branch (unless hotfix)

github-actions bot and others added 9 commits April 6, 2026 06:34
When many tool calls complete in parallel (e.g., batch Linear get_issue),
Claude Code spawns multiple PostToolUse hooks simultaneously. These all
open the same per-project SessionDB and compete for the SQLite write lock.

While better-sqlite3's busy_timeout (30s) handles most contention, edge
cases during transaction lock escalation can still surface SQLITE_BUSY.
This causes hooks to fail, and Claude Code reports "hook error" to users.

Changes:
- Wrap SessionDB.insertEvent() transaction with withRetry() for
  defense-in-depth against SQLITE_BUSY under concurrent hook access
- Set busy_timeout pragma in BunSQLiteAdapter to match better-sqlite3's
  timeout option (previously ignored, causing immediate SQLITE_BUSY
  failures under Bun runtime)
- Add concurrent insert test verifying multiple DB instances can write
  to the same file without data loss
@mksglu mksglu changed the base branch from main to next April 13, 2026 20:03
mksglu added a commit that referenced this pull request Apr 13, 2026
@mksglu
Copy link
Copy Markdown
Owner

mksglu commented Apr 13, 2026

Hey @james-gough-op — great work here. Reviewed and merged onto next.

What we validated:

  • withRetry is the correct pattern for transaction lock escalation deadlocks that busy_timeout alone can't handle (same pattern already used in store.ts:609,766)
  • Bun busy_timeout propagation is a genuine gap — better-sqlite3 does it via constructor, bun:sqlite doesn't
  • 2.6s max busy-wait is within Claude Code's 60s hook timeout budget
  • Concurrent insert test passes with 5 DB instances against same file

Note: We resolved the merge conflict with next (we recently added a NodeSQLiteAdapter in db-base.ts for #228) and applied your changes on top. The Bun busy_timeout fix was placed correctly in the BunDatabaseFactory branch.

Commit: d5a5b47 on next.

Thanks for the thorough analysis of the lock escalation scenario — the transaction retry approach is exactly right. 🙏

@mksglu mksglu closed this Apr 13, 2026
sebastianbreguel added a commit to sebastianbreguel/context-mode that referenced this pull request Apr 15, 2026
…p paths

Adds unit tests for pure utilities in src/db-base.ts that back the
SQLITE_BUSY retry wrapping (mksglu#263, mksglu#243) and corruption recovery path
behind mksglu#244/mksglu#218 incidents.

- withRetry: first-try success, retry-on-busy, rethrow non-busy, empty
  delays, exhaustion message, non-Error throws, delay honored.
- isSQLiteCorruptionError: all four documented signatures + negatives.
- renameCorruptDB: main-only, all sidecars, missing files.
- cleanOrphanedWALFiles: orphan removed, live DB preserved.
- deleteDBFiles: unconditional cleanup of main/-wal/-shm.
- defaultDBPath: pid embedding, prefix, tmpdir scoping.

31 tests, all green. No changes to src/.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants